BIOTECHNOLOGY ^ S 



RAW SEQUENCE LISTING 
ERROR REPORT 




The Biotechnology Systems Branch of the Scientinc and Technical Information 
Center (STIC) detected errors when processing the following computer readable 
form: 

Application Serial Number: 

Source: I '^H^ - — 

• "Date Processed by^STlC: ll2/oH- ^ 

' THE ATTACHED PRINYOUT EXPLAINS DETECTED ERRORS. 
PLEASE FORWARD THIS INFORMATION TO THE APPLICANT BY EITHER: 

1) INCLUDING A COPY OF THIS PRINTOUT IN YOUR NEXT COMMUNICATION TO THE 
APPLICANT, WITH A NOTICE TO COMPLY or, 

2) TELEPHONING APPLICANT AND FAXING A COPY OF THIS PRINTOUT, WITH A 
NOTICE TO COMPLY 

FOR CRF SUBMISSION AND PATENTIN SOFTWARE QUESTIONS. PLEASE CONTACT 
MARKSPENCER, TELEPHONE: 703-308-4212; FAX: 703-308-4221 
Effective 12/13/03 : TELEPHONE: 571-272-2510; FAX: 571-273-0221 



TO REDUCE ERRORED SEQUENCE LISTINGS, PLEASE USE THE CHECK EU 
VERSION 4.1 PROGRAM , ACCESSIBLE THROUGH THE U S PATENT AND 
TRADEMARK OFFICE WEBSITE SEE BELOW FOR ADDRESS: 

httn://www.uspto.gov/wcb/ofriccs/nac/checker/ch kr41notc.htm 



Applicants submilliiig genetic scquciKc infornwlion electronically on diskette or CD-Roni should be nwnrc ihc 
a possibility that tiic disk/CD-Roni may have been affected by treatment >^cn to all incoming mail. 
Please consider using allcniatc mcdiods of submission for Urrfisk/CD-Roni or replacement disk/CD-Rom. 
Any reply including a sequence listing in electronic fornrThOO ld NOTbe sent to the 2023 1 zip code address for the 
United Stales Patent and Trademark Office, and instead should be sent via ti\c following to the indicated addresses: 

1. EPS-Bio «littD://www.iisnto.gov/ebc/cfs/downlonds/dociimc nts.htm> . EFS Submission 

User Manual - cPAVE) 

2. U.S. Postal Scn icc: Commissioner for Patents, P.O. Box 1450, Alexandria, VA 22313-1450 

3. Hand Carry directly to (EFFECTIVE 12/01/03): 

U.S. Patent and Trademark Omcc. Bo.v Sequence, Customer Windo\v, Lobby, Room 1B03. Crs'Stal Pla/ii Two. 
201 1 South Clark Place, Arlington, VA 22202 

4. Federal E.vprcss, United Parcel Scnicc, oc other delivery scr^'icc to: U.S. Patent and Trademark Orficc, 
Box Sequence Room 4B03-Mailroom, Cr>'£Uil Plaza Two, 201 1 South Clark Pl^ce, Arlington, VA 22202 



Revised 10/08/03 



Raw Sequence Listing Error Summary 



ERROR DETECTED SUGGESTED CORRECTION SERIAL NUMBER: ^^((-^-^ 1/ 

ATTN: NEW RULES CASES: PLEASE DISREGARD ENGLISH "ALPHA** HEADERS, WHICH WERE INSERTED BY PTO SOFTWARE 

1 Wrapped Nucleics The number/text at the end of each line "wrapped" down to the next line. This may occur if your file 

Wrapped Aminos was retrieved in a word processor after creating it. Please adjust your right margin to .3; this will 
prevent "wrapping." 



2 Invalid Line Length The rules require that a line not exceed 72 characters in length. This includes white spaces. 

3 Misaligned Amino The numbering under each 5'*^ amino acid is misaligned. Do not use tab codes between numbers; 

Numbering use space characters, instead. 

4 Non-ASCH The submitted file was not saved in ASCII(DOS) text, as required by the Sequence Rules. Please 

ensure your subsequent submission is saved in ASCII text 

5 Variable Length Sequence(s) contain n's or Xaa's representing more than one residue. Per Sequence Rules, 

each n or Xaa can only represent a single residue. Please present the maximum number of each 
residue having variable length and indicate in the <220>-<223> section that some may be missing. 

A "bug" in Patentin version 2.0 has caused, the <220>-<223> section to be missing from amino acid 

sequences(s) . Normally, Patentin would automatically generate this section from the 

previously coded nucleic acid sequence. Please manually copy the relevant <220>-<223> section to 
the subsequent amino acid sequence. This applies to the mandatory <220>-<223> sections for 
Artificial or Unknown sequences. 

Sequence(s) missing. If intentional, please insert the following lines for each skipped sequence: 

(2) INFORMATION FOR SEQ ID NO;X: (insert SEQ ID NO where "X" is shown) 
(i) SEQUENCE CHARACTERISTICS: (Do not insert any subheadings under this heading) 

(xi) SEQUENCE DESCRlPTION:SEQ ID NO:X: (insert SEQ ID NO where "X" is shown) 
This sequence is intentionally skipped 

Please also adjust the "(i«) NUMBER OF SEQUENCES:" response to include the skipped sequences. 



10 



11 



12 



_Patentln 2.0 
~ "bug" 



_Skippcd Sequences 
(OLD RULES) 



Skipped Sequences 
'(NEW RULES) 



Sequence(s) 



missing. If intentional, please insert the following lines for each skipped sequence. 



<2I0> sequence id number 
<400> sequence id number 
000 



Use of n*s or Xaa's Use of n's and/or Xaa's have been detected in the Sequence Listing. 

/(NEW RULES) Per 1.823 of Sequence Rules, use of <220>-<223> is MANDATORY if n's or Xaa's are present. 

/ In <220> to <223> section, please explain location of n or Xaa, and which residue n or Xaa represents. 



Invalid <213> 
Response 



Per 1.823 of Sequence Rules, the only valid <213> responses arc: Unknown, Artificial Sequence, or 
scientific name (Genus/species), <220>-<223> section is required when <213> response is Unknown or 
is Artificial Sequence 



_Useof<220> Sequence(s) 



missing the <220> "Feature" and associated numeric identifiers and responses. 



Patentin 2.0 
"bug" 



Use of <220> to <223> is MANDATORY if <213> "Organism" response is "Artificial Sequence" or 
"Unknown." Please explain source of genetic material in <220> to <223> section. 
(Sec "Federal Register," 06/01/1998, Vol. 63, No. 104, pp. 2963 1 -32) (Sec. 1 .823 of Sequence Rules) 

Please do not use "Copy to Disk" function of Patentin version 2.0. This causes a corrupted file, 
resulting in missing mandatory numeric identifiers and responses (as indicated on raw sequence 
listing). Instead, please use "File Manager" or any other manual means to copy file to floppy disk. 



j 3 Misuse of n/Xaa "n" can only represent a single nucleotide ; "Xaa" can only represent a single amino acid 



AMC - Biotechnology Systems Branch - 09/09/2003 
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IFW16 



RAW SEQUENCE LISTING 

PATENT APPLICATION: US/10/031 , 496B 



DATE: 03/12/2004 
TIME: 14:56:14 



Input Set : A:\NREL 99-45 . ST25 , txt 
Output Set: N:\CRF4\03122004\J031496B.raw 

3 <110> APPLICANT: National Renewable Energy Laboratory 

5 <120> TITLE OF INVENTION: Cellobiohydrolase I Gene and Improved Variants 

7 <130> EILE REFERENCE: NREL 99-4 5 

9 <140> CURRENT APPLICATION NUMBER: 10/031, 496B 
10 <141> CURRENT FILING DATE: 2002-01-14 
12 <160> NUMBER OF SEQ ID NOS : 120 

14 <170> SOFTWARE: Patentin version 3.2 . ..... 

16 <210> SEQ ID NO: 1 ' & ^ I y ; i-T-h , . 

17 <211> LENGTH: 28 t^WM^-f <^ ^- li/^;^^^ 

18 <212> TYPE: DN4-" ~~ ''"^^^^^^ f^i^'»^r^ 

19 <213> ORGANIsK^Synthetic DNA 

21 <400> SEQUENCeT^^Tt— — ^ ^ ^ {jTXV'l.'^ 

22 agagagtcta gacacggagc ttacaggc y^tj^yT-^C^-u^ 



25 <210> SEQ ID NO: 

26 <211> LENGTH: 35 

27 <2I2> TYPE: DNA' 

28 <213> ORGANISM: Synthetic DNA y 

30 <4 0 0> SEQUENCET-^ ^ 

31 aaagaagcgc ggccgcgcct gcactctcca atcgg 

34 <210> SEQ ID NO: 3 ^ 

35 <211> LENGTH: 24..,.- ^ 

36 <212> TYPE: U 

37 <213> ORGANi™^: ^Synthetic DNA 

39 <400> SEQUENCET"^ ' 

4 0 ggcggaaacc cgcctggcac cacc 

43 <210> SEQ ID NO: 4 

44 <2J 1> LENGTH: 1550 

45 <212> TYPE: DNA 



/ 




} 



2i 



35 



'i'richoderrna roesei 



4 6 <213> ORGANISM: 
4 9 <220> FEATURE: 

50 <221> NAME/KEY: inisc_signal 

51 <222> LOCATION: (1)..(51) 

53 <220> FEATURE: 

54 <221> NAME/KEY: 



24 



55 <222> LOCATION: 

57 <22 0> FEATURE: 

58 <221> NAME/KEY: 

59 <222> LOCATION: 

61 <22 0> FEATURE: 

62 <221> NAME/KEY: 

63 <222> LOCATION: 

65 <22 0> FEATURE: 

66 <221> NAME/KEY: 



CDS 
(3) , 



(1550) 



inisc_f eature 
(52)". . (1344) 

misc_ feature 
(1345"") . . (] 435) 

misc binding 



file://C:\CRF4\Outhold\VsrJ031496B.htm 
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RAW SEQUENCE LISTING DATR: 03/12/2004 

PATENT APPLICATION: US/10/031 , 496B TIME: 14:56:14 

Input Set : A:\NREL 99-45.ST25.txt 
Output Set: N:\CRF4\03122004\J031496B.raw 

67 <222> LOCATION: ( 1436) 1550 ) 

69 <400> SEQUENChl: 4 

70 at gta tog gaa gtt ggc cgt cat etc ggc ctt ctt ggc cac age teg 47 

71 Val Ser Glu Val G.ly Arg His Leu Gly Leu Leu Gly .His Sor Ser 

72 1 5 10 15 

7 4 tgc tea gtc ggc ctg cac tct cca ate gga gac tea ccc gee tct gac 95 

7 5 Cys Ser Val Gly Leu His Ser Pro lie Gly Asp Ser Pro Ala Ser Asp 

76 20 25 30 

78 atg gca gaa atg etc gtc tgg tgg cac gtg cac tea aca gac agg etc 143 

7 9 Met Ala Glu Met Leu Val Trp Trp His Val His Ser Thr Asp Arg Leu 
80 35 40 45 

82 cgt ggt cat cga cgc caa ctg gcg ctg gac tea cgc tac gaa cag cag 191 

8 3 Arg Gly His Arg Arg Gin Leu Ala Leu Asp Ser Arg Tyr Glu Gin Gin 
84 50 ' 55 60 

8 6 cac gaa ctg eta cga tgg caa cac ttg gag etc gac cet atg tec tga 239 

87 His Glu Leu Leu Arg Trp Gin His Leu Glu Leu Asp Pro Met Ser 

88 65 70 75 

90 caa cga gac ctg cgc gaa gaa ctg ctg tct gga egg tgc cgc eta cgc 287 

91 Gin Arg Asp Leu Arg Glu Glu Leu Leu Ser Gly Arg Cys Arg Leu Arg 

92 80 85 90 

94 gtc cac gta egg agt tac cac gag egg taa cag cct etc cat tgg ctt 335 

95 Val His Val Arg Ser Tyr His Glu Arg Gin Pro Leu His Trp Leu 

96 95 100 105 

9 8 tgt cac cca gtc tgc gca gaa gaa cgt tgg cgc teg cet tta cet tat 38 3 

99 Cys His Pro Val Cys Ala Glu Glu Arg Trp Arg Ser Pro Lou Pro Tyr 

100 110 115 120 125 

102 ggc gag cga cac gac eta cca gga att cac cct get tgg caa cga gtt 4 31 

103 Gly Glu Arg His Asp Leu Pro Gly He His Pro Ala Trp Gin Arg Val 

104 " 130 135 140 

106 etc ttt cga tgt tga tgt tte gca get gee gtg egg ctt gaa egg age 479 

107 Leu Phe Arg Cys Cys Phe Ala Ala Ala Val Arg Leu Glu Arg Ser 

108 145 150 155 

110 tct eta ctt cgt gtc cat gga cgc gga tgg tgg cgt gag caa gta tec 527 

111 Ser Leu Leu Arg Val His Gly Arg Gly Trp Trp Arg Glu Gin Val Ser 

112 160 165 170 

114 cac caa cac cgc tgg cgc caa gta egg cac ggg gta ctg tga cag cca 575 

115 His Gin His Arg Trp Arg Gin Val Arg Hi.s Gly Val Leu Gin Pro 

116 175 180 185 

118 gtg tec ccg cga tct gaa gtt cat caa tgg cca ggc caa cgt tga ggg 623 

119 Val Ser Pro Arg Ser Glu Val His Gin Trp Pro Gly Gin Arg Gly 

120 190 195 200 

122 ctg gga gcc gtc ate caa caa cgc gaa cac ggg cat tgg agg aca egg 671 

123 Leu Gly Ala Val He Gin Gin Arg Glu His Gly His Trp Arg Thr Arg 

124 205 210 215 

126 aag ctg ctg etc tga gat gga tat ctg gga ggc caa etc cat etc cga 719 

127 Lys Leu Leu Leu Asp Gly Tyr Leu Gly Gly Gin Leu His Leu Arg 

128 220 225 230 

130 ggc tct tac ccc cca ccc ttg cac gac tgt egg cca gga gat ctg cga 7 67 

131 Gly Ser Tyr Pro Pro Pro Leu His Asp Cys Arg Pro Gly Asp Leu Arg 
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RAW SEQUENCE LISTING DATE: 03/12/2004 

PATENT APPLICATION: US/10/031 , 496B TIME: 14:56:14 



Input Set : A:\NREL 99-45.ST25.txt 
Output Set: N:\CRF4\03122004\J031496B.raw 
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RAW SKQUENCE LISTING 

PATENT APPLICATION: US/10/031 , 496B 



DATE: 03/12/2004 
TIME: 14:56:14 



Input Set : A:\NREL 99-45.ST25.txt 
Output Set: N:\CRF4\03122004\J031496B.raw 



198 tgc ctg taa age tec 

199 Cys I,eu Ser Ser 

200 500 



1550 



203 <210> SEQ ID NO: 5 

204 <211> LENGTH: 78 

205 <212> TYPE: PRT 

206 <213> ORGANISM: Trichoderma reesei 
208 <400> SEQUENCE: 5 

210 Val Ser Glu Val Gly Arg His Leu Gly Leu Leu Gly His Ser Ser Cys 

211 1 5 10 15 . 

214 Ser Val Gly Leu His Ser Pro lie Gly Asp Ser Pro Ala Ser Asp Met 

215 20 25 30 

218 Ala Glu Met Leu Val Trp Trp His Val His Ser Thr Asp Arg Leu Arg 

219 35 40 45 

222 Gly His Arg Arg Gin Leu Ala Leu Asp Ser Arg Tyr Glu Gin Gin His 

223 50 55 60 

226 Glu Leu Leu Arg Trp Gin His Leu Glu Leu Asp Pro Met Ser 

227 65 70 75 

230 <210> SEQ ID NO: 6 

231 <211> LENGTH: 25 

232 <212> TYPE: PRT 

233 <213> ORGANISM: Trichoderma reesei 
235 <400> SEQUENCE: 6 

237 Gin Arg Asp Leu Arg G.lu Glu T,eu Leu Ser Gly Arg Cys Arg Leu Arg 

238 1 5 10 15 

241 Val His Val Arg Ser Tyr His Glu Arg 

242 20 25 

245 <210> SEQ ID NO: 7 

246 <211> LENGTH: 42 

247 <212> TYPE: PRT 

248 <213> ORGANISM: Trichoderma reesei 
250 <400> SEQUENCE: 7 

252 Gin Pro Leu His Trp Leu Cys His Pro Val Cys Ala Glu Glu Arg Trp 

253 15 10 15 

256 Arg Ser Pro Leu Pro Tyr Gly Glu Arg His Asp Leu Pro Gly lie His 

257 20 25 30 
2 60 Pro Ala Trp Gin Arg Val Leu Phe Arg Cys 

261 35 40 

264 <210> SEQ ID NO: 8 

265 <211> LENGTH: 40 

266 <212> TYPE: PRT 

267 <2]3> ORGANISM: Trichoderma reesei 
269 <400> SEQUENCE: 8 

271 Cys Phe Ala Ala Ala Val Arg Leu Glu Arg Ser Ser Leu Leu Arg Val 

272 1 5 10 15 

275 His Gly Arg Gly Trp Trp Arg 0.1 u Gin Val Ser His Gin His Arg Trp 

276 20 25 30 
27 9 Arg Gin Val Arg His Gly Val Leu 

280 35 40 
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RAW SEQUENCE LISTING DATE: 03/12/2004 

PATENT APPLICATION: US/10/031 , 496B TIME: 14:56:14 

Input Set : A:\NREL 99-45.ST25.txt 
Output Set: N:\CRF4\03122004\J031496B.raw 

283 <210> SEQ ID NO: 9 

284 <211> LENGTH: 16 

285 <212> TYPE: PRT 

286 <213> ORGANISM: Trichoderma reesei 
288 <400> SEQUENCE: 9 

2 90 Gin Pro Val Scr Pro Arg Ser Glu Val His Gin Trp Pro Gly G] n Arg 
291 15 10 15 

294 <210> SEQ ID NO: 10 

295 <211> LENGTH: 21 

296 <212> TYPE: PRT 

2 97 <213> ORGANISM: Trichoderma reesei 
299 <400> SEQUENCE: 10 

301 Gly Leu Gly Ala Val He Gin Gin Arg Glu His Gly His Trp Arg Thr 

302 15 10 15 

305 Arg Lys Leu Leu Leu 

306 20 

309 <210> SEQ ID NO: 11 

310 <211> LENGTH: 28 

311 <212> TYPE: PRT 

312 <213> ORGANISM: Trichoderma reesei 
314 <400> SEQUENCE: 11 

316 Asp Gly Tyr Leu Gly Gly Gin Leu His Leu Arg Gly Ser Tyr Pro Pro 

317 1 5 10 15 

320 Pro Leu His Asp Cys Arg Pro Gly Asp Leu Arg Gly 

321 20 25 

324 <210> SEQ ID NO: 12 

325 <211> LENGTH: 8 
32 6 <212> TYPE: PRT 

327 <213> ORGANISM: Trichoderma reesei 
329 <400> SEQUENCE: 12 

331 Trp Val Arg Arg Asn Leu Leu Arg 

332 1 5 

335 <210> SEQ ID NO: 13 

336 <211> LENGTH: 69 

337 <212> TYPE: PRT 

338 <213> ORGANISM: Trichoderma reesei 

340 <400> SEQUENCE: 13 

342 Gin He Trp Arg His Leu Arg Ser Arg Trp Leu Arg Leu Glu Pro He 

343 1 5 10 15 

34 6 Pro Pro Gly Gin His Gin Leu Leu Arg Pro Trp Leu Lys Leu Tyr Pro 
347 20 25 30 

350 Arg Tyr His Gin Glu He Asp Arg Cys His Pro Val Arg Asp Val Gly 

351 35 40 45 

354 Cys His Gin Pro He Leu Cys Pro Glu Trp Arg His Phe Pro Ala Ala 

355 50 55 60 

358 Gin Arg Arg Ala Trp 

359 65 

362 <210> SEQ ID NO: 14 

363 <211> LENGTH: 8 
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RAW SEQUENCE LISTING ERROR SUMMARY 

PATENT APPLICATION: US/10/031 , 496B 



DATE: 03/12/2004 
TIME: 14:56:15 



Input Set : A:\NREL 99-45.ST25.txt 
Output Set: N:\CRF4\03122004\J031496B.raw 



Pl eas e Note: 

Use of n and/or Xaa have been detected in the Sequence Listing. Please review the 
Sequence Listing to ensure that a corresponding explanation is presented in the <220> 
to <223> fields of each sequence which presents at least one n or Xaa. 

Seq#:4; Xaa Pos. 4 93 
Seq#:18; Xaa Pos. 57 
Seq#:32; Xaa Pos. 57 
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VERIFICATION SUMMARY DATE:: 03/12/2004 

PATENT APPLICATION: US/10/031 , 496B TIME: 14:56:15 



Input Set : 
Output Set: 

L:195 M:258 W: Mandatory Foaturo 
L:195 M:341 W: (46) "n" or "Xaa" 
L:450 M:341 W: (46) "n" or "Xaa" 
L:702 M:341 W: (46) "n" or "Xaa" 



A:\NREL 99-45.ST25.txt 
N:\CRF4\03122004\J031496B.raw 

missing, <223> Tag not found for SKQ ID#:4 
used, for SEQ ID#:4 after pos.:].535 
used, for SEQ ID#:18 after pos.:48 
used, for SEQ ID#:32 after pos.:48 



file://C:\CRF4\Outhold\VsrJ03 1 496B.htm 



